Skip to content

feat(devx): local dev setup for control plane and full end-to-end flow (MLI-6681)#823

Open
lilyz-ai wants to merge 13 commits into
mainfrom
lilyz-ai/mli-6681-control-plane-local-devx
Open

feat(devx): local dev setup for control plane and full end-to-end flow (MLI-6681)#823
lilyz-ai wants to merge 13 commits into
mainfrom
lilyz-ai/mli-6681-control-plane-local-devx

Conversation

@lilyz-ai
Copy link
Copy Markdown
Collaborator

@lilyz-ai lilyz-ai commented May 7, 2026

Summary

Adds a complete local development workflow for model-engine so developers can iterate on both control plane code and the full endpoint lifecycle without cloud credentials or prod images.

Control-plane-only mode (make dev-server):

  • Spins up Postgres + Redis via docker-compose
  • LOCAL=true activates fake queue/docker/k8s implementations (mirrors CIRCLECI=true)
  • Full gateway API available at :5000 with auth skipped — no k8s cluster needed

Full end-to-end mode (make dev-server-full + make dev-service-builder + make dev-k8s-cacher):

  • make kind-up + make kind-image creates a local kind cluster and loads model-engine:local into it
  • Service Builder picks up endpoint creation tasks from local Redis and creates real k8s Deployments in kind
  • K8s Cacher polls kind and writes endpoint status back to Redis
  • Echo server (model-engine:local) used as the inference container — no GPU required

Code fixes included:

  • service_builder/celery.py + celery_task_queue_gateway.py: onprem cloud provider now uses redis Celery backend instead of s3 — without this, the Service Builder writes results to Redis but the Gateway looks in S3, leaving endpoints stuck in PENDING
  • dependencies.py: LOCAL=true + cloud_provider=onprem falls through to real OnPremQueueEndpointResourceDelegate instead of the fake
  • env_vars.py: GIT_TAG defaults to "local" when LOCAL=true so k8s templates reference the correct model-engine:local image

New files:

  • docker-compose.local.yml — Postgres 15 + Redis 7 with healthchecks and persistent volume
  • service_configs/service_config_local.yaml — HMI config for local services
  • model_engine_server/core/configs/local-full.yaml — onprem infra config for kind
  • Makefile — all dev targets in one place

Test plan

  • make dev-up && make dev-migrate && make dev-server — gateway starts, GET /v1/model-endpoints returns 200
  • make kind-up && make kind-image — kind cluster created, model-engine:local loaded
  • make dev-server-full + make dev-service-builder + make dev-k8s-cacher — all three processes start cleanly
  • POST a sync CPU endpoint with the echo server image → pod appears in kubectl --context kind-llm-engine get pods -n model-engine and endpoint transitions to READY
  • Existing unit tests pass: make test

Closes MLI-6681

🤖 Generated with Claude Code

Greptile Summary

  • Adds make dev-server (control-plane only, fake k8s/queue) and make dev-server-full / make dev-service-builder / make dev-k8s-cacher (full kind-based end-to-end) workflows; backing services are Postgres 15 + Redis 7 via a new docker-compose.local.yml.
  • Fixes a backend/broker mismatch for onprem: both celery_task_queue_gateway.py and service_builder/celery.py now correctly use redis as the Celery result backend for cloud_provider == \"onprem\", and dependencies.py routes LOCAL=true to fake queue delegates for non-onprem configs while still using the real OnPremQueueEndpointResourceDelegate for the full local flow.
  • OTLPMetricExporter is imported inside the shared SDK availability guard in correlation.py; this silently expands the OTel requirement to include opentelemetry-exporter-otlp-proto-grpc, which is only listed in vllm-specific requirements and not in the main requirements.txt.

Confidence Score: 4/5

Safe to merge for the intended local-dev purpose; one P1 in correlation.py could silently disable tracing in non-standard environments

Core bug fixes (celery backend protocol, dependency wiring) are correct and consistent. The P1 in correlation.py is isolated to environments that have opentelemetry-sdk without the OTLP exporter, which is non-standard given the existing requirements layout, so real-world impact is low but the architectural concern is valid.

model-engine/model_engine_server/common/startup_tracing/correlation.py — the OTLPMetricExporter guard import

Important Files Changed

Filename Overview
model-engine/model_engine_server/common/startup_tracing/correlation.py Adds OTLPMetricExporter import to the shared SDK availability guard, silently tightening OTel requirements beyond what main requirements.txt provides
model-engine/model_engine_server/infra/gateways/celery_task_queue_gateway.py Correctly adds onprem to the redis backend_protocol branch, consistent with the service_builder change and fixing the S3/Redis backend mismatch
model-engine/model_engine_server/service_builder/celery.py Correctly extends backend_protocol to use redis for onprem, matching the gateway change and resolving the stuck-in-PENDING bug
model-engine/model_engine_server/api/dependencies.py LOCAL+non-onprem correctly routes to FakeQueueDelegate; LOCAL flag added to Redis task queue branch so control-plane-only dev mode works without onprem config
model-engine/model_engine_server/common/env_vars.py GIT_TAG defaults to 'local' when LOCAL=true and validation check correctly skips the error; prevents startup failure in local dev without GIT_TAG set
model-engine/Makefile Adds all dev targets with both LOCAL_ENV and FULL_LOCAL_ENV configurations; ML_INFRA_SERVICES_CONFIG_PATH is now pinned
model-engine/docker-compose.local.yml Clean Postgres 15 + Redis 7 compose file with healthchecks; Postgres uses persistent volume, Redis is ephemeral (appropriate for local dev)
model-engine/model_engine_server/core/configs/local-full.yaml onprem config for kind with celery_broker_type_redis: true; correctly enables Redis broker+backend path for the full local flow
model-engine/service_configs/service_config_local.yaml Uses cache_redis_onprem_url (correct field) to avoid cloud-provider assertions; previously-noted issue with cache_redis_aws_url has been addressed

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph Control-plane-only ["Control-plane-only (make dev-server)"]
        A[LOCAL=true\ncloud_provider=aws] --> B[FakeQueueDelegate]
        A --> C[Redis TaskQueueGateway\nlocalhost:6379]
        A --> D[FakeDockerRepository]
    end

    subgraph Full["Full end-to-end (make dev-server-full + builder + cacher)"]
        E[LOCAL=true\ncloud_provider=onprem] --> F[OnPremQueueDelegate]
        E --> G[Redis TaskQueueGateway\nlocalhost:6379]
        G -->|Celery task| H[Service Builder\nredis broker + redis backend]
        H -->|k8s Deployment| I[kind cluster\nmodel-engine:local]
        I -->|status| J[K8s Cacher\nwrites to Redis]
        J --> G
    end

    subgraph Infra
        K[(Postgres\nlocalhost:5432)]
        L[(Redis\nlocalhost:6379)]
    end

    C --- L
    G --- L
    A --- K
    E --- K
Loading

Fix All in Cursor Fix All in Claude Code Fix All in Codex

Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
model-engine/model_engine_server/common/startup_tracing/correlation.py:15-17
**OTLP exporter import tightens OTel availability requirement silently**

`OTLPMetricExporter` is placed in the same `try/except ImportError` block as the core SDK availability check. This means any environment that has `opentelemetry-api` + `opentelemetry-sdk` installed but NOT `opentelemetry-exporter-otlp-proto-grpc` will now get `OTEL_AVAILABLE = False` and all trace correlation will silently be skipped. The exporter is only listed in vllm-specific requirements (`inference/vllm/requirements.txt`), not in the main `requirements.txt`, making this a fragile dependency for a shared utility. The import should be in its own nested `try/except` or removed entirely if `OTLPMetricExporter` isn't actually instantiated in this file.

Reviews (10): Last reviewed commit: "Merge branch 'main' into lilyz-ai/mli-66..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

lilyz-ai and others added 2 commits May 7, 2026 01:59
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a one-command local development workflow for the model engine control
plane so developers can iterate on gateway/service-builder code without
building prod images or touching live infra.

- docker-compose.local.yml: spins up Postgres 15 + Redis 7
- service_configs/service_config_local.yaml: HMI config for local services
- Makefile: dev-up / dev-migrate / dev-server / dev-down / test targets
- LOCAL=true env var now activates fake queue/docker implementations
  (parallel to existing CIRCLECI=true path) and skips GIT_TAG requirement
- README: new "Control Plane Local Setup" section with full walkthrough

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread model-engine/service_configs/service_config_local.yaml Outdated
Comment thread model-engine/Makefile Outdated
Comment thread model-engine/Makefile Outdated
…G_PATH

- service_config_local.yaml: switch from cache_redis_aws_url to
  cache_redis_onprem_url so the Redis URL is resolved before the
  cloud_provider assertion fires — fixes startup failure for non-AWS configs
- Makefile: pin ML_INFRA_SERVICES_CONFIG_PATH to default.yaml so local
  dev is not affected by a developer's ambient infra config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread model-engine/README.md
lilyz-ai and others added 2 commits May 7, 2026 02:32
- README: add ML_INFRA_SERVICES_CONFIG_PATH to the manual env-var snippet
  so developers with non-AWS ambient configs don't accidentally hit
  the cloud_provider assertion
- docker-compose.local.yml: mount a named volume for Postgres so the
  database survives dev-down/dev-up cycles

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces the manual until-loops in dev-up with `docker compose up --wait`,
which blocks until healthchecks pass and exits non-zero if they fail —
eliminating the infinite-spin on container crash.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lilyz-ai
Copy link
Copy Markdown
Collaborator Author

lilyz-ai commented May 7, 2026

@greptile review

@lilyz-ai
Copy link
Copy Markdown
Collaborator Author

lilyz-ai commented May 7, 2026

/greptile

lilyz-ai and others added 2 commits May 7, 2026 03:07
Extends the local dev setup so the complete control plane → Service Builder
→ k8s inference pod flow can be tested locally without cloud credentials.

Changes:
- local-full.yaml: new onprem infra config pointing to localhost Redis/kind
- dependencies.py: LOCAL=true + cloud_provider=onprem falls through to real
  Redis queue delegate instead of the fake (enabling full k8s flow)
- service_builder/celery.py: fix onprem to use redis backend not s3
- env_vars.py: default GIT_TAG to "local" when LOCAL=true so k8s templates
  reference the correct model-engine:local image loaded into kind
- Makefile: kind-up/kind-down/kind-image targets + dev-server-full,
  dev-service-builder, dev-k8s-cacher targets using FULL_LOCAL_ENV
- README: full end-to-end setup section with step-by-step instructions,
  example endpoint creation, and flow table

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The gateway's module-level backend_protocol had the same aws/gcp/azure
mapping as service_builder/celery.py. Without this fix, the Service Builder
writes task results to Redis but the Gateway looks in S3, leaving endpoints
stuck in PENDING under the kind-based full local flow.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lilyz-ai
Copy link
Copy Markdown
Collaborator Author

lilyz-ai commented May 7, 2026

@greptile review

@lilyz-ai
Copy link
Copy Markdown
Collaborator Author

lilyz-ai commented May 7, 2026

/greptile

@lilyz-ai lilyz-ai changed the title feat(devx): local control plane dev setup (MLI-6681) feat(devx): local dev setup for control plane and full end-to-end flow (MLI-6681) May 7, 2026
lilyz-ai and others added 6 commits May 7, 2026 03:30
The exporter package was imported unconditionally under the OTEL_AVAILABLE
flag which only checked the base SDK, not the exporter. Include it in the
try block so OTEL_AVAILABLE stays False when the exporter is absent, fixing
the ImportError that caused run_unit_tests_server to fail.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…chema gateway

- Reformat correlation.py and celery.py to satisfy black
- Move noqa comment to the from...import( line so ruff F401 is suppressed correctly
- Pass schema_generator=GenerateJsonSchema() (new required kwarg) to
  get_definitions() and get_openapi_path() in live_model_endpoints_schema_gateway,
  creating a fresh instance per route since pydantic rejects reuse

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…oes not have this param

The param was added to fix a local test failure (FastAPI 0.110.0 requires it)
but FastAPI 0.135.1 (pinned in requirements.txt, used by CI) does not accept it,
causing mypy call-arg errors. Revert to the original signature.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…xample

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment on lines +15 to +17
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import ( # noqa: F401
OTLPMetricExporter,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 OTLP exporter import tightens OTel availability requirement silently

OTLPMetricExporter is placed in the same try/except ImportError block as the core SDK availability check. This means any environment that has opentelemetry-api + opentelemetry-sdk installed but NOT opentelemetry-exporter-otlp-proto-grpc will now get OTEL_AVAILABLE = False and all trace correlation will silently be skipped. The exporter is only listed in vllm-specific requirements (inference/vllm/requirements.txt), not in the main requirements.txt, making this a fragile dependency for a shared utility. The import should be in its own nested try/except or removed entirely if OTLPMetricExporter isn't actually instantiated in this file.

Prompt To Fix With AI
This is a comment left during a code review.
Path: model-engine/model_engine_server/common/startup_tracing/correlation.py
Line: 15-17

Comment:
**OTLP exporter import tightens OTel availability requirement silently**

`OTLPMetricExporter` is placed in the same `try/except ImportError` block as the core SDK availability check. This means any environment that has `opentelemetry-api` + `opentelemetry-sdk` installed but NOT `opentelemetry-exporter-otlp-proto-grpc` will now get `OTEL_AVAILABLE = False` and all trace correlation will silently be skipped. The exporter is only listed in vllm-specific requirements (`inference/vllm/requirements.txt`), not in the main `requirements.txt`, making this a fragile dependency for a shared utility. The import should be in its own nested `try/except` or removed entirely if `OTLPMetricExporter` isn't actually instantiated in this file.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Cursor Fix in Claude Code Fix in Codex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant